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Abstract. We show that the linear dependence on p of the running time of 
Kedlaya's point-counting algorithm in characteristic p may be reduced to p 1 / 2 . 



1. Introduction 

In [KedOlj . Kedlaya introduced an algorithm for computing the zeta function 
of a hyperelliptic curve over F p n of genus g > 1, which was remarkable for having 
running time polynomial in g and n. Kedlaya did not discuss the dependence of 
the running time on p, and indeed at first it was thought that the algorithm would 
be practical only for very small primes. Later it was found that the dependence on 
p was roughly linear f |GG03| . see also the survey paper |Ked04j ). 

The main step of Kedlaya's algorithm — the step where the linear dependence 
of the running time on p occurs — involves computing a p-adic approximation to 
the matrix of the p-th power Frobenius acting on a certain basis for the Monsky- 
Washnitzer cohomology of the curve (more precisely, the curve minus a few points). 
This is a 2g x 2g matrix with entries in the degree n unramified extension of Q p . 
Kedlaya computes this matrix to precision p N in time 0(pN 2 g 2 n), where the 'soft- 
oh' notation O(X) indicates 0(X(\ogX) k ) for some k > 0. 

Our main result is the following. Let u) denote the exponent of matrix multipli- 
cation; that is, to is a real number such that m x m matrices over a ring R may be 
multiplied using 0(m UJ+E ) ring operations in R for any e > 0. Trivially one can take 
u> = 3; see [Str69] for the simplest example of a matrix multiplication algorithm 
that achieves u> < 3. 

Theorem 1. Let N > 1, and suppose that 

(1) p>(2N-l)(2g + l). 

Then the entries of the above matrix may be computed to precision p N in time 

0(p 1/2 N 5/2 g"n + N 4 g 4 n\ogp). 
In particular, for fixed N, g and n, the running time is Oijp 1 / 2 ). 

Our new algorithm is therefore superior to Kedlaya's original algorithm for fixed 
g and N and large enough p, but inferior for fixed p and large enough N or g. The 
final step of Kedlaya's algorithm is to compute the characteristic polynomial of the 
above matrix, but the running time of this step is only logarithmic in p, and will 
not concern us further. 

The purpose of the assumption p > (2N — l)(2g+l) is to simplify the analysis of 
denominators. It could be weakened somewhat, but the algorithm would become 
more complicated. 

l 
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The author was motivated to develop this algorithm, not for point-counting 
purposes, but rather because of the role that the above matrix plays in the fast 
computation of p-adic heights of points on elliptic curves, as described in [MST06j . 
In that application, the parameter N plays quite a different role. In [KedOlj . the 
aim is to compute the characteristic polynomial of Frobenius to sufficient precision 
that its exact value is pinned down by the Weil conjectures. Consequently Kedlaya 
takes N — O(gn) and expresses all running time estimates in terms of g and n 
alone. On the other hand, in MST06J, there is no reason to tie N to n or g. 
Indeed, g = 1 for an elliptic curve, and taking n — 1 suffices to handle curves 
defined over Q. Rather, the choice of N ultimately depends on how accurately one 
wishes to determine the p-adic height. Therefore, in this paper we will analyse the 
dependence on N separately from that of n and g. 

Our basic approach is the same as in [KedOlj : starting with an explicitly given 
basis of differentials for the Monsky-Washnitzer cohomology, we compute a rep- 
resentation of the action of an explicitly chosen lift of Frobenius on each basis 
differential, and then we apply a reduction algorithm that uses the cohomology re- 
lations to express the images as linear combinations of the original basis elements, 
thereby obtaining the desired matrix. 

However, our algorithm differs from that of |Ked01] in two important respects. 
First, we make the key observation that the reductions in cohomology are given by 
formulae which may be interpreted as solving a linear recurrence with polynomial 
coefficients. Therefore, instead of performing the reduction steps 'one at a time', 
it becomes possible to use a baby-step/giant-step algorithm of Chudnovsky and 
Chudnovsky CC88] to execute a whole sequence of reductions in far less time than 
it would take to perform the reductions consecutively. 

Second, to exploit this idea we must use a different representation for the relevant 
differentials. The difficulty is that in [KedOlj . the images of the basis differentials 
under Frobenius are approximated by series whose number of terms is at least linear 
in p, making it impossible to reach a running time proportional to p 1 ^ 2 . We will 
use instead a different series approximation whose number of terms depends only 
on g and N, not on p. 

Rather than using the Chudnovskys' algorithm as they presented it, we take 
advantage of a modification due to Bostan, Gaudry and Schost [BGS07, that im- 
proves on the running time by a factor logarithmic in the length of the recurrence. 
In our setting this translates to a speedup of 0(\og(pN)), which for the feasible 
range of p is very significant. 

The relationship between our algorithm and the paper [BGS07] runs somewhat 
deeper. As one of the principal applications of their improved techniques for solving 
recurrences, they give an algorithm for computing the zeta function of a hyperellip- 
tic curve over a finite field. Their approach is quite different to Kedlaya's, relying 
on the representation of the entries of the Hasse-Witt matrix associated to the curve 
y 2 = f{ x ) as certain coefficients of the polynomial f(xY p ~ 1 ^ 2 (mod p). They then 
use the Chudnovskys' idea to efficiently compute those selected coefficients, without 
computing the whole polynomial. It is striking that the Chudnovskys' algorithm 
plays such a central role in these two quite different approaches to computing zeta 
functions. 

Our algorithm improves on the zeta function algorithm of U3GS07j in several 
ways, all of which may be traced to our essentially p-adic viewpoint. Whereas we 
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obtain the zeta function modulo p for any N > 1, their algorithm is only able 
to recover the zeta function modulo p, and they must then use other methods, 
such as ^-adic methods, to obtain further information [BGS071 pp. 1800-1801]. 
Furthermore, they achieve a running time of 0(p 1 / 2 g 3 ^ 2+UJ n) [BGS071 Theorem 17], 
which falls behind our algorithm by a factor of g 3 ^ 2 (ignoring the term involving 
logp). The factor of g 3 ^ 2 may be accounted for as follows. In both our algorithm 
and the algorithm of [BGS07] , it is occasionally necessary to divide by p. To prevent 
precision loss at these division steps, }BGS07j are forced to lift from working modulo 
p to working p-adically, artificially introducing O(g) safety digits |BGS07| p. 1798]. 
In our setting, the extra p-adic digits are "already there" , and it is simply a matter 
of analysing the propagation of p-adic error terms. This explains a factor of g. 
The remaining factor of g 1 ^ 2 is more technical; essentially it occurs because our 
"reduction matrices" (see $5|) have certain p-adic analyticity properties that reduce 
the total number of matrices we must compute (see §7.2.1|) . 

Hubrechts [Hub07 , following a suggestion of Lauder, recently showed how to 
combine Kedlaya's algorithm with Dwork's deformation theory to improve the as- 
ymptotic running time with respect to n (although the dependence on g becomes 
worse) . It would be interesting to study whether our approach to handling large p 
is compatible with these developments. 

Organisation of the paper. In $2] wc fix notation, and in Sj3] we outline Ked- 
laya's original algorithm. In $4] we give our alternative expression for the action 
of Frobenius on the appropriate differentials. In $5] we reformulate certain coho- 
mological reductions as linear recurrences. In $6]we give a slight generalisation of 
the algorithm of [BGS07] for solving linear recurrences. In Sj7]we describe the main 
algorithm, prove its correctness, and analyse its complexity. Finally, in Sj5]we give 
some examples of timings for an implementation of the algorithm. 

Acknowledgements. Many thanks to Kiran Kedlaya for supplying the first clue 
that led to this algorithm, and for many helpful discussions about his algorithm, 
particularly regarding the thorny questions of precision loss. I would also like to 
thank William Stein for introducing me to the problem of computing p-adic heights, 
and for supplying the hardware on which the sample computations were performed 
(funded by NSF grant No. 0555776). Thanks to Barry Mazur, Kiran Kedlaya, 
Karim Belabas, William Stein, and an anonymous referee for several helpful com- 
ments on an early version of this paper. 

2. Notation and setup 

We will follow the notation of [KcdOl fairly closely. Let p > 3 be a prime, and 
let q — p n for some n > 1. The finite fields with p and q elements are denoted by 
F p and F g . We denote by Q 9 the unramified extension of Q p of degree n, and by 
Z q its ring of integers. 

Let Q £ F q [x] be a monic polynomial of degree 2g + 1 (g > 1) with no multiple 
roots, so that the equation y 2 = Q(x) defines the (projective) hyperelliptic curve 
C/F q of interest. We select an arbitrary lift Q £ Z q [x] of Q(x), also monic and of 
degree 2g + 1. (Note that in the application to computing p-adic heights [MST06], 
the input data is actually Q itself, rather than just Q.) 

Let 

A = Fq[x,y 1 y- 1 }/(y 2 -Q(x)); 
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this is the coordinate ring of the curve C obtained from C by removing the point 
at infinity and the points whose abscissae are the zeroes of Q(x). Let 

A = Z q [x,y,y- 1 ]/{y 2 - Q{x)) 

be the lift of A associated to Q(x). Let be the weak completion of A; explicitly, 
A^ is the ring of power series 

such that Vp(di i j) — * oo at least linearly in max(i, |j|). 

We will work mainly in the module Q~ of differentials of A^ over Q g on which 
the hyperelliptic involution acts by — 1. Explicitly, these are expressions of the form 

s>o tez 

where the a St t are subject to the same decay condition as above. Two differentials 
£ 17 _ are cohomologous, denoted w ~ r], if there exists some / e A* ® Q, 
such that lo — r\ — df. We define the reduction of to to be the unique differential 
T) = B(x)dx/y, cohomologous to u>, such that the degree of B £ Q q [x] is at most 
2g — 1. The existence and uniqueness of 77 follows from the fact that {x l dx/y} i £^ 
forms a basis for the Monsky-Washnitzer cohomology [KedOll p. 329]. 

We lift the p-power Frobenius on F q to A^ as follows. On Z q , we take the 
canonical Witt vector Frobenius. We set x a = x p , 



(2) (y-l^^VI^WW'-W 



DC 



and y a = (j/ -<r ) -1 . The above series converges in A* (because Q(x) a — Q(x) p is 
divisible by p), and the definition ensures that a is an endomorphism of A\ We 
further extend a to fl~ by <r(/ dg) = f rT d{g a ). 

3. A SKETCH OF KEDLAYA'S ORIGINAL ALGORITHM 

In this section we will briefly describe Kedlaya's algorithm, paying particular 
attention to the dependence of the running time on p. 

He begins by computing an approximation to y~ a of the form 

iT-«y-'E Mx) 



where each A^ has degree at most 2g. It is an approximation in two senses: it is 
truncated at a certain power of y~ 2 , and the coefficients are represented modulo 
p N , for some appropriately chosen TV' (slightly larger than N) . Note that the time 
committed is already proportional to at least p, for the number of terms in the 
above series is about Np. 

Next he takes the basis {x l dx/y} 2 ^ 1 for the de Rham cohomology of A (actually, 
for its minus eigenspace under the hyperelliptic involution). Using the above series 
expansion of y~ a ', he computes an approximation to the image of each basis element 
under Frobenius, 

(3) a(x i dx/y) = x pz d(x p )y- a = px pl+p - 1 y- (T dx 
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as a series of the form 

(4) a{x i dx/y)p ,J2^-dx/y, 

where each Fj has degree at most 2g, and where again the series have about Np 
terms. 

For each i, he then applies a reduction algorithm to the terms on the right hand 
side of (HI). At each step, he uses the identities y 2 = Q(x) and 2ydy = Q'(x)dx, 
together with the fact that d(x s y t ) = in cohomology for any s and t, to reduce 
the term Fj{x)y~ 23 dx/y to a lower power of y~ 2 (or in some cases, y 2 ). The terms 
are swept up sequentially until reaching j = 0. At this point one has computed 
the reduction of a(x z dx/y), whose coefficients give the (i + l)-th column of the 
Frobenius matrix. The reduction step is performed once for each j, so again the 
total time is proportional to at least p. 

4. The Frobenius action on differentials 

As noted above, one of the barriers to making Kcdlaya's algorithm run in time 
less than linear in p is that the series approximation for a (x 1 dx/y) given by (U|) 
has about Np terms. The following proposition gives a different approximation for 
<j(x l dx/y) that requires only 0(N 2 g) terms; in particular, the number of terms 
does not depend on p. 

Proposition 2. Let Cj jr £ Z q be the coefficient of x r in Q(x) 3 . For < j < N, 
let 

^> = ^E(-i) fe+J ("f)g) ez, 

k—j 

For < i < 2g, set 

N-l (2fl+l)j 

(5) T t = J2 B^+^-'y-P^+^dx/y. 

Fhen the reduction of Fi agrees modulo p N with the reduction of a(x % dxjy). 
Proof. From $2$j and {3) we obtain 

(6) a(x l dx/y) = Y>( ~ 1/2 ) (Q(x) CT - QixYfx^+^y-^+^+Ux/y. 

fc=o ^ k ' 

Since Q(x) a — Q(x) p is divisible by p, the k-th term Uk of ([6]) is of the form 

U k ^p k+1 F{x)y-^ 2k+1 '>dx, 
where F S Z q [x] has degree at most 

((20+ l)p- l)k+pi+p- 1 < (2g + l)(k + \)p. 
By repeatedly dividing F{x) by Q(x) = y 2 , we may rewrite this as 

{k+i)p-\ 

U k =P k+1 F j (x)y- p V ik+1 ') +2j dx, 

3=0 

where each Fj £ Z g [x] has degree at most 2g. 
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We must show that the coefficients of the reduction of Uk are divisible by p , 
for all k > N. The terms for which < j < (k + ^)p may be handled by [KedOll 
Lemma 2] , which shows that the reduction of Fj {x)y^ p ^ 2k+1 ^> +2 ^ becomes integral on 
multiplication by p 1 +L lo s P ( 2 ' £ + 1 )J . Assumption implies that [log p (2fc+ 1)J < k — 
N, which covers this case. The remaining terms for which (k+^)p < j < (k+l)p— 1 
require |Ked01| Lemma 3]. (Note: Lemma 3 as stated in [KcdOlJ is incorrect. A 
corrected version is in the errata to [KedQT], and a proof is given in Lemma 4.3.5 
of |Edi03j .) For these j we find that the reduction of Fj(x)y~ p ( 2k+1 ) +2j dx becomes 
integral on multiplication by p m where 

m = Llog p ((2 ff + l)(-p(2fc + 1) + 2j + 2) - 2)J < [log p ((2g + l)p)J < 1, 

the last inequality again depending on {!]). 

Consequently the terms in © for k > N do not contribute modulo p N to the 
reduction of a(x l dx/y) : so we may ignore them. Therefore, let 

W -! /_1/0\ 

T * = E P\ l W X Y - Q{x) p ) k ^ l+P ^y^ {2k+1)+1 dx/y. 

We now replace Q(x) p by y 2p , use the binomial formula to expand (Q(x) a — y 2p ) k , 
and write out the coefficients Q{x) a explicitly in terms of the Cj>. After rearranging 
the summations, we obtain the representation for Tj indicated in the statement of 
the proposition. □ 

Remark. Ultimately, the linear contribution of p to the running time of Kedlaya's 
original algorithm arises from explicitly expanding out the Q(x) p term in a formula 
of the above type. In the proof of Proposition^ we avoided this by substituting y 2p 
for Q(x) p , and we will see that our algorithm will accordingly never need to compute 
the coefficients of Q(x) p . At first glance this may seem odd, since in Kedlaya's 
original algorithm, the expansion of Q(x) p — more precisely, the congruence modulo 
p between Q(x) p and Q{x) a — is precisely what causes the terms in y" with high 
powers of y~ 2 to have p-adically small coefficients. In our case however, one finds 
that the reduction of each term Bj ir x pl ^ l+r+1 ^ 1 y~ p ^ 2: ' +1 ^ +1 dx/y of Tj generally 
contributes to all N digits of the coefficients of the reduction of Tj, regardless of 
the value of r or j. In fact, even the sum of all terms for a given power of y~ 2 (that 
is, for a given j) contributes to all N digits. It is almost as if our algorithm ignores 
the decay conditions defining A* . Of course those decay conditions do play a role, 
by inducing hidden cancellations among the Bj^ r . 

5. Horizontal and vertical reduction 

Let s > — 1 and t S Z. We define W s .t to be the Q 9 -vector space of differentials 
of the form 

F(x)x s y- 2t dx/y, 

where F(x) E Q q [x) has degree at most 2g. In the case s = —1, we impose the 
additional condition that the constant term of F{x) must be zero (so that none of 
the differentials ever involve negative powers of x). 

In ^5. H and i|5.2l we will give maps between the various W Sj t that send differentials 
to cohomologous differentials. The point is to give explicit formulae, so that the 
maps may be interpreted as defining linear recurrences. First we discuss 'vertical' 
reductions, which map W-i t t to W-ij-t; this is the main type of reduction that 
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appears in |Ked01 . Then we discuss 'horizontal' reductions, which map W s ,t to 
W s -i,t- The aim is to eventually reduce everything to W-i,q, since this space 
consists of the differentials of the form G(x)dx/y, where G has degree at most 
2g-l. 

We will generally identify elements of W St t with vectors in Z 2g+1 (or Z 2g in the 
case s = —1), with respect to the basis {x l+s y~ 2t dx/y} 2 *L Q (or with respect to 
{x l y~ 2t dxjy) 2 ^ 1 in the case s — —1). 

5.1. Vertical reduction. Let < i < 2g and t E Z. Since Q(x) has no repeated 
roots, we can find polynomials Ri,Si G Z q [x], where degi?^ < 2g — 1 and deg5j < 
2g, such that 

(7) x l = R l (x)Q(x) + Si(x)Q'(x). 

(To get the integrality of Ri and Si, we have used the assumption that p > 2g + l, so 
that the leading coefficient of Q'(x) is a unit.) Using the relation 2ydy = Q'(x)dx, 
we have 

x l y- 2t dx/y = R t {x)y- 2t+2 dx/y + 2S l (x)y~ 2t dy. 
Since d(Si(x)y~ 2t+1 ) is zero in cohomology, after a little algebra we find that 

(8) jy-^ly J*- 1 ^ 28 ^ ,-*^. 

(The above calculation is essentially the one in [KedOll p. 329].) 

This last relation may be rephrased in terms of the vector spaces W-t t t as follows. 

Proposition 3. Let 

M v (t) : W_i, t -> W-x.t-j 

6e i/ie linear map given by the 2g x 2g matrix whose (i + l)-th column consists of 
the coefficients of the polynomial (2t — l)Ri(x) + 2S'i(x). Let 

D v (t) =2t-l. 

Then for any u> € W-xj, we have 

w~Dv(*) _1 Mv(i)w (GW_i, t -i). 

In other words, Dy{t)~ 1 Mv {t) is the reduction matrix for transporting a differ- 
ential from W-i t t to a cohomologous differential in W—i t—\. Note that the entries 
of My(t) are linear polynomials in Z q [t], as is Dy(t). 

We will be interested in iterating this process. For to < ti, let 

Afv(to,ti) : W_i, tl W_ Mo 

be defined by 

M v (i , ii) = M v {t Q + l)M v {t + 2) ■ • • Mv(t x ). 

Similarly let 

Dv(*o,*i) = *M*o + l)Dv(*o + 2) • • • Dv(*i). 
With this notation we obtain: 

Proposition 4. for an?/ a; G W-ifa, 

w~Dv(to,ti)- 1 M v («o,t 1 )w (e W-i lt0 ). 
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Example 5 (An elliptic curve). We compute Mv(t) for the elliptic curve y 1 
Q(x) = x 3 + ax + b. First solve ([7|) for i = 0, 1, obtaining 

R {x) = A- 1 (-18oa; + 276) 

S (x) = A-^Gax 2 - 9bx + 4a 2 ) 

Ri{x) = A- 1 (276x + 6a 2 ) 

Si (a) = A" 1 (-96a: 2 - 2a 2 x - 6a6), 

- 4a 3 is the discriminant of the curve. Therefore 
l)R (x) + 2S' Q {x) = A -1 (-6aa:(6i - 7) + 96(6i - 5)) 
(2* - l)i2i(i) + 2S[{x) = A _1 (9&2:(6f - 7) + 2a 2 (6i - 5)), 
and so the matrix My(i) is given by 



where A 



276 2 
(2*- 



M v (t) 



96(6* -5) 2a 2 (6i-5) 
-6a(6* - 7) 96(6i - 7) 



5.2. Horizontal reduction. Let s > and f G Z. In cohomology, 
~ d(x s y- 2t+1 ) 
= sx s - l y- 2t+1 dx - (2* - l)x s y~ 2t dy 



sx s -"Q{x) - -(2* - l)x s g'(a;)J y- M dx/y. 
Decompose Q(a;) as 

Q(sb) =x 29+1 + P(sb), 
where P 6 Z 9 [x] has degree at most 2g. After substituting this into the previous 
equation and rearranging, we obtain 



(9) x s+Z9 y- zt dx/y 
Proposition 6. Let 



(2g+ 1)(2* - 1) - 2s 



M| r (s) : W s>t -> U. 



&e */ie linear map given by the matrix 

( 

d^OO o 

Af^a) = D h( s ) 



Co(s)\ 

C x {s) 

C 2 (s) 



V o 







DM C 2g (s)J 



whe 



D t H (s) = (2g + l)(2t-l)-2s, 



and where Ch(s) is the coefficient of x h in the polynomial 

C(x, s) = 2sP(x) - (2* - l)xP'{x). 
Then for any u> € W St t, we have 

u ~ D^(a)- 1 M| r (a)w (e W s _i, t ). 

Proof. The bulk of the statement follows from (J9]). In addition, the constant term 
of C(x,0) is zero, so Mjj(Q) does indeed map into W-\,t- □ 
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Note that, for a fixed choice of t, the entries of Mjj(a) and Djj(s) are linear 
polynomials in Z 9 [s], and D l H (s) does not vanish for any s, since it is always odd. 
To iterate this process, we define, for — 1 < sq < s±, 

Mjj(a ,ai) : W Sl , t -> W So , t 

by 

M H (a , at) = M H (a + l)M H (a + 2) • • • M* B (ai), 

and 

D^(s , «i) = D* H (a + l)D H {a + 2) • ■■D t H (s 1 ). 

We obtain: 

Proposition 7. For any cl> G W si ,t, 

lj-D^(so,si)- 1 M^(so,si)w (e W» ,t). 

Example 8 (An elliptic curve). We compute D H (s) and M H (s) for the elliptic 
curve y 2 — Q(x) — x 3 + ax + b. We have 

D^(s) = 6t-2s-3, 

and P(x) = aa; + 6, so 

2sP(x) - (2i - l)xP'(x) = ax(2s - 2t + 1) + 6s. 
Then Mjj-(s) is given by 

/ 26s \ 

M H (a) = 6i - 2s - 3 a(2s-2i+l)J. 

\ 6i-2s-3 / 

6. Algorithms for linear recurrences 

The following theorem from |BGS07j is not precisely what we will need, but it 
is close enough that we will be able to adapt it without difficulty. To state it, we 
need to introduce some notation from BGS07J. Let R be a commutative ring with 
identity. In this section we will work in an algebraic model of computation, so 
running times are measured by counting ring operations in R. We denote by M (d) 
the time required to multiply polynomials of degree d over R, and by M M (m) the 
time required to multiply to x m matrices with entries in R. In BGS07J they make 
several reasonable regularity assumptions about the growth of M(d) and MM (to), 
which are certainly satisfied in the cases we will consider. 

For any integer s > 0, they define a certain quantity D(l, 2*,2 S ) € R. The 
definition is straightforward, but lengthy, and we will not give it here. The only 
fact we need (see |BGS07i p. 1787]) is that if 2, 3, . . . , 2 s 1 + 1 are units in R, then 
D(l, 2 s , 2 s ) is invertible in R, and that its inverse may be used to efficiently recover 
the inverses of certain other elements of R that are needed in the interpolation steps 
of their algorithm. 

Now, let M(X) be an to x to matrix of linear polynomials in i?[A], and let K > 1 
be an integer. Given an initial vector Uq £ R m , they define a sequence of vectors 

by 

U i+1 =M{i + l)Ui 

for i > 0. If one wishes to compute several Ui in the range < i < K, the 
naive algorithm (simply iterating the above relation) requires time 0{m 2 K). The 
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following theorem improves substantially on this, as long as not too many Ui are 
requested. 

Theorem 9 ( [BGS071 Theorem 15]). Let < K t < K 2 < ■ ■ ■ < K r = K be 

integers, and let s = [l°g4 K\ . Suppose that 2, 3, . . . , 2 s + 1 are invertible in R, 
and that the inverse of D(1,2 S ,2 S ) is known. Suppose also that r < K?~ £ , with 
< e < 1/2. Then Uk x , ■ • ■ , Uk t can be computed using 

0(MM(m)VK + m 2 M(\/~K)) 

ring operations in R. 

The theorem we require is a little stronger. Using similar notation to the hori- 
zontal and vertical reduction matrices of <j5l we define 

M(k, k') = M(k')M(k' - 1) • • • M(k + 2)M(k + 1) 

for k < k' . (Note that we have switched the ordering of the matrices from <|5l to 
match the notation of |BGS07| . It is trivial to adapt the algorithm to work in the 
opposite direction.) Instead of just computing the images Uk 1 , . . . , Uk t of a single 
vector Uk, our aim is to compute the matrices M(Ki, Li) for a sequence of intervals 
(Ki, Li). The following slight generalisation of Theorem [5] achieves this. 

Theorem 10. Let 

< K 1 < L x < K 2 < L 2 < ■ ■ ■ < K r < L r < K 

be integers, and let s — [log4 K\ . Suppose that 2, 3, . . . , 2 s + 1 are invertible in R, 
and that the inverse of D(1,2 S ,2 S ) is known. Suppose also that r < Kz~ e , with 
< e < 1/2. Then M(K\, L\), . . . , M(K r , L r ) can be computed using 

0(MM(m)VK + m 2 M(V~K)) 

ring operations in R. 

Remark. When we prove the main complexity result (Theorem [1} we will ignore 
the distinction between the two terms in the above estimate. The key point is that 
the running time is soft-linear in ^/~K, and polynomial in m. 

Proof. The algorithm is almost exactly the same as the one given in the proof of 
BGS07, Theorem 15], so we will not spell out all the details. To explain it, we 
first give a very high-level sketch of their algorithm. In "Step 0" , they compute a 
sequence of matrices 

(10) M(0, H),M(H, 2H), M((B - 1)H, BH), 

where both H and B are a small constant factor away from y/K. They apply 
these matrices successively to Uq to compute UkH for all < k < B. Each target 
index Ki will fall within one of the intervals [kH, (k + 1)H]. Then they perform 
a "refining" step, where they deduce from UkH by evaluating appropriate 
products of M(X) over (much smaller) subintervals of [kH, Ki]. To stay within the 
time bounds, they use multipoint evaluation techniques to refine towards all target 
indices simultaneously. 

(The main reason that their algorithm is a logarithmic factor faster than the 
Chudnovskys' algorithm is that in Step 0, they give up some control over which 
intervals are computed, in exchange for having available a faster method for com- 
puting them. This is why the separate refining step is necessary.) 
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To adapt this to our needs, we need only perform a little extra work. Given the 
input indices Ki and Li, we compute the sequence (|10|) . using the same method as 
BGS07]. We now perform a refining step using the same algorithm as in [BGS07J. 
but we will need to refine over more intervals. Suppose that Ki lies in \k\H, {k\ + 
1)H] and that Li lies in [k2H, (k^ + 1)H], where k^ > k\. If k\ = then we refine 
over [Ki, Li}. If k2 > k\, we must refine over both [Ki, (k% + 1)H] and \kiH-, Li]. 

After computing the products M{k, k') for each of these intervals, we must per- 
form an additional 'gluing' step. Namely, each of our target intervals (Ki,Li) is a 
union of intervals (k, k') for which M(k, k') has been computed (either in Step 
or in the refining step), and so we simply multiply together the M(k, k') for those 
intervals, in the appropriate order. 

To estimate the total time, we note first that our 'Step 0' is identical to their 
'Step 0'. The refining steps take at most twice as long as theirs, since we have at 
most doubled the number of intervals to be considered, and the lengths of those 
intervals satisfy the same bounds. One must also check the invertibility conditions 
in R; these are still satisfied since they depend only on the maximum length of the 
intervals, which has not changed. Finally, the extra gluing step consists of at most 
0{\/~K) matrix multiplications, costing time 0(MM(m)VK), which fits within the 
required time bound. □ 



7. The main algorithm 

In this section we describe the main algorithm for computing the Frobenius 
matrix. The basic idea is to start with the approximation Tj for a{x l dx/y) given 
by Proposition [2j and then to use the reduction maps to push each term towards 
W-ifl. Theorem [TU] is used to efficiently compute the reduction maps. 

Figure Q] illustrates the strategy in the case g = 1 and N = 3. Each vertex corre- 
sponds to a W s ,ti and the arrows correspond to horizontal and vertical reductions. 
The black vertices are those which are the starting point for at least one term from 
some Ti. (There are additional vertices and arrows used in the algorithm that for 
reasons of clarity are not shown on the diagram.) 



5p-l 
2 



-1 

o - 



p-1 

— • -6 



t V 



2p-l 
— • — 



3p-l 



s 

ip-1 



5p-l 



6p-l 



7p-l 



8p-l 
— • 



O 



Figure 1. Reduction strategy for g = 1 and N = 3 



One of the more magical aspects of Kedlaya's original algorithm is the way that 
p-adic precision losses propagate through the calculation. Although one needs to 
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perform about N divisions by p, Kedlaya shows that in fact only 0(log p N) spare 
digits of precision must be carried. 

A similar argument applies to our algorithm, and since we have assumed p to 
be sufficiently large compared to g and N, it turns out that only one spare digit 
is necessary. However, some caution is required. For example, the product of all 
the Mjj(s) across a whole 'row' of the horizontal reductions will generally be zero 
modulo p N , and therefore one must interleave the multiplications by M B (s) and 
divisions by D t H {s) in such a way that the denominators can "catch up with" the 
build-up of p-divisibility of the numerators. In §7.21 we perform a more detailed 
analysis, showing how to do almost all of the work with no spare digits at all. In 
practical terms, avoiding even this single extra digit yields enormous savings in 
time and memory when N is small. For the vertical reductions, at least in the case 
N > 1, this kind of analysis seems much more difficult, and consequently we will 
retain the spare digit. 

7.1. Preliminaries. The algorithm works in two different rings, Rq = Z q /(p N ) 
and R\ — Z q /(p N+1 ). At certain stages we will need to compute a/b, where b is 
not a unit; we may take the result to be any c satisfying be = a. We will see below 
that such divisions will always be possible in Z q when they occur, and that the 
errors introduced do not contribute to the final result modulo p N . 

As a preliminary step, we compute the coefficients Bj r given in Proposition [21 
for < j < N and < r < (2g + as elements of R\. 
Let us write Ti as 

N-l i+(2g+l)i+l 

T id , k = B^^xPK^y-P^+^dx/y, 

where for convenience we declare that Bj. r = for r < 0. Note that T"i,j,fc S W p k-i,t, 
where t = \ {{2j + l)p- 1). 

7.2. Horizontal reduction phase. This phase is performed once for each < 
j < N; throughout this section we regard j as fixed. 

Let t = |((2j + l)p-l). The aim is to use the horizontal reduction maps to find 
differentials Wij € W-i,t that are cohomologous to Tij, and whose coefficients are 
correct modulo p , for < i < 2g. 

7.2.1. Computing the reduction maps. Let L = (2g + l)j + 2g. We must first 
compute the horizontal reduction matrices 

M(k)=M t B ((k-l)p,kp-2g-2), 

,IJl D(k) = D H ((k- l)p,kp-2g-2), 

for 1 < k < L, with entries in Rq. (Once computed, it may be convenient to lift 
them to i?i, but it is only necessary to know them modulo p .) 

This is accomplished in two steps. We will discuss M(k) only; the D(k) are 
handled entirely analogously. 

The first and most time-consuming step is to use Theorem [TU] to compute M(k) 
for 1 < k < L', where V = min(N,L). To verify the invertibility hypotheses of 
Theorem [TU1 we must check that VK + 1 < p, where K = L'p — 2g — 2 is the total 
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length of the interval containing all the reduction intervals. From ([TJ) we know that 
(2fl + 1)(2JV — 1) <p- 1, so 

2K < (2g + 1)(2N - l)p + (2g 

< (p-l)p+(p-l)p-2p- 

<2(p-l) 2 , 



l)p - 2p - 4g - 4 
4o -4 



from which the desired inequality follows. 

The second step is to deduce the remaining M(k) for N < k < L. (This is of 
course only necessary when L > N.) It is possible to simply use Theorem HOl again, 
but it is much more efficient to take advantage of the known values M(l), . . . M(N). 
If N = 1 this is trivial, since the M(k) are all equal modulo p. The author thanks 
Kiran Kedlaya for suggesting the following interpolation method to handle the case 
N > 1. 

Consider the matrix 

F(s) = M l H {s - p + 1) • • • Mjjis -2g- 2), 

which is a matrix of polynomials in s. Expanding as a Taylor series in s, we obtain 

1 



M(k) = F{kp) = F(0) + F'(0)kp 



(N-l)\ 



Then by simple linear algebra, the values of F(kp) (mod p N ) for 1 < k < N 
determine completely the values of F^ {fi)p l / i\ for < i < N. Namely, we have 



/ F(p) \ 
F(2p) 



1 \ t 

2 N-1 1 



F(0) 
F'(0)p 



and the Vandermonde matrix is invertible modulo p (since p > N) . After solving for 
the F«(0y/«!, the remaining M(k) are computed by substituting the appropriate 
values of k into the above Taylor series. 

Remark. In the case N = 1 there is a yet faster method available for computing 
D(k) (although not M(k)). Namely, since t = —1/2 (mod p) we have 

p~2g-2 

D(k) = D(l)= Jl -2(2.g + l) -2s (mod j?), 

s=l 

which by Wilson's theorem is equal to 

p-i 

( _ 2)P -2 S -2 -Q ss (2 2 9+i(2 5 + l)!)- 1 (modp). 

s=2g+2 

7.2.2. Performing the reductions. Now we fix < i < 2g, and show how to compute 
Wij. We will define a sequence of differentials v i+ ^ 2g+l yj +l , . . . , vt, Vo, where v m S 
W mp —i t t, with the property that 



(12) 



i+(2g+l)i+l 
k—m 
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In particular we will have ~ Tij, so this vo is the Wij that we seek. The v m are 
computed with entries in R\. However, not all their p-adic digits will be correct; 
we will say more about this in a moment. 
Naturally, the sequence begins with 

v i+(2g+l)j+l = T it j ii+ (2g+l)j+l- 

Then, given v m , we compute v m -i as follows. We first move from W m p—i,t to 
W mp -2g-2,t, one step at a time, via the following sequence: 

G W mp -x t t, 

l^Mjjimp - l)v£> £ W mp - 2 ,t, 

2)- 1 M t H (mp~2)v$ eW mp _ 3 , u 



v^=D t H (rap- 
v^=D t H {mp- 



v (2g+2) = D t H ( mp _ 2g _ iyl M t( mp _ 2 g - 1)1,(29+1) g W mp -2g-2,t- 

Using the reduction matrices (fTTTl computed above, we set 

(13) v' m = D%{{m - l)p, mp-2g- 2)- 1 M t H ((m - l)p, mp -2g- 2)v^+ 2 \ 

and then take one final step to reach 

w m _i = Tij^-i + D^dm - l)p)- 1 M t H ((m - l)p)v' m g W( m _i) p _i, t . 

If all of the above computations are performed to infinite precision, then it follows 
from Propositions[(5]and[7]that if v m satisfies (|12p. then also v m -\ also satisfies (|T^|). 
and then by induction also vq satisfies (IT2t . 

Now we analyse the propagation of errors. To facilitate the analysis, we intro- 
duce the following terminology. Suppose that v is a vector of length 2g + 1, with 
coordinates in Ri. Let e(v) denote the error term associated to v. That is, e(v) 
is the difference between the value for v stored by the machine and the value that 
would have been obtained for v if all computations had been performed to infinite 
precision. We will say that v is (-correct if: 

• the ^-th coordinate of v is divisible by p; 

• the £-th coordinate of s(v) is divisible by p N+1 ; and 

• the remaining coordinates of e(v) are divisible by p N . 

Note that v i+ ( 2 g+i)j+i is 1-correct, since its first coordinate is simply Bj^ 2g +i)j, 
which has been computed in R\ and is divisible by p, and the other coordinates 
are all zero. The following series of claims together show that if v m is 1-correct, 
then also v m -i is 1-correct. Consequently vq is 1-correct, and in particular Wij is 
computed correctly to precision p , 

Claim 1. Let 1 < I < 2g. If Vm is (-correct, then d„ +1 ' is [i + l)-correct. 

Proof. We first examine the form of the matrix M l H (rnp~l). Let P(x), C(s, x) and 
Cft(s) be the polynomials introduced in Proposition [BJ We are taking s — mp — t = 
—£ (mod p) and t = —1/2 (mod p), so 

C(x,s) = -2lP{x) + 2xP'{x) (modp). 

In particular the coefficient Ce(s) of x e is zero modulo p, so the entry in the {£+ 1)- 
th row of the last column of M%{mp — £) is zero modulo p. Consequently the 

contribution to d„ +1 ' from the last entry of Vm satisfies the required conditions. 
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Furthermore, it is clear from Proposition [6] that the only other possibly nonzero 
entry in the (£+ l)-th row appears in the £-th column. Therefore also receives 

a contribution from the £-th entry of Vm , which by hypothesis already satisfies the 
required conditions. 

Finally, the denominator 

D^imp -£) = (2g + l)(2i - 1) - 2s = -2((2g + 1) - t) (mod p) 
is a unit, so dividing by it does not disturb ^-correctness. □ 

Claim 2. If Vm is (2g + \)-correct, then Dm 9+2 ' is correct modulo p N . 
Proof. Let w = M^(mp — 2g — \)vm 9+2 \ We have 

(14) D t H (mp-2g-l) = (2g + l)(2t-l)-2(mp-2g-l) = (mod p), 

so by the definition of M f H , the first 2g columns of M t H (mp—2g— 1) are zero modulo 
p. Since the first 2g coordinates of Vm S+1 ' are correct modulo p N , the contribution 
they make to w is divisible by p and correct modulo p N+1 . The contribution from 
the last coordinate of Vm 9+1 ' is by hypothesis already divisible by p and correct 
modulo p N+1 . 

We deduce that w is divisible by p and correct modulo p N+1 . It suffices now to 
show that the valuation of D t H (mp — 2g~l) is precisely 1. Since we know it is odd 
and divisible by p, we must bound its absolute value below p 2 . From (fT4"]l and the 
definition of t we have 

D^mp -2g-l) = ((2g + l)(2j + 1) - 2m)p, 

and then the desired result follows from ([1]), since < m < (2g +1)0 + 1) — 1- □ 

Claim 3. If Vm 9+2 ^ is correct modulo p N , then so is v' m . 

Proof. By (fl~3)l it suffices to show that Z)^((m — l)p,mp — 2g — 2) is a unit. The 
latter quantity is 

fcp-2g-2 p-2g-2 

Yl (2g + l){2t - 1) - 2s = Yl -2((2.g+l) + s) (mod p) 

s=(k-l)p+l a=l 

since t = — 1/2 (mod p), so it is a unit. □ 

Remark. In the above proof, we only needed the values of M| r ((m— l)p, mp—2g — 2) 
and D^((m — l)p,mp — 2g — 2) modulo p N , not modulo p N+1 . This is why it is 
possible to do almost all of the work in the horizontal reductions using only N 
digits. 

Claim 4. If v' m is correct modulo p N , then u m _i is l-correct. 

Proof. The same argument used in the proof of Claim [1] shows that the first row of 
Mjj^m— l)p) is entirely zero modulo p, and that -D#((m— l)p) is a unit. Therefore 
the contribution to v m -\ from v' m is l-correct. The contribution from Tij jm —i is 
also l-correct. □ 
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7.3. Vertical reduction phase. We first prove some lemmas that will be used to 
analyse the error propagation in the vertical reduction phase. 

Lemma 11. If t = 1/2 (modp), then My{t) is invertible modulo p. 

Proof. Under the hypothesis on t, it follows from the definition of My{t) that the 
entries of its (i + l)-th column are given by the coefficients of S'^x). To show 
that My{t) is invertible modulo p, it suffices to show that the Sf (x) are linearly 
independent over F q . If Y^i=o ^i^i( x ) = is some linear relation, then we may 
integrate (permissible, by (QJ) to obtain Xi^o* ^iSi(x) = \2 g for some A2 5 £ F q . 
Multiplying this by Q (x), and using ([7]), we obtain 

2g-l 

XiX 1 ee X 2g Q (x) (mod Q{x)). 

i=o 

But l,x,... ,x 2g ~ 1 , Q (x) are linearly independent in F q [x]/Q{x), since Q'(x) has 
degree 2g and unit leading term (again due to {TJ)). This forces every A; = 0. □ 

Remark. It would be interesting to characterise the values of t for which My (t) is 
singular modulo p. For instance, in the case of an elliptic curve, Example [5] shows 
that My(t) is singular precisely when t ee 7/6 (mod p) or t ee 5/6 (mod p). By 
studying the kernels and images of such maps, it may be possible to reduce the 
working precision in the vertical reduction steps from p N+1 to p , as was done for 
the horizontal reductions. 

Lemma 12. If to = —1/2 (mod p), then My {to, to + p) is zero modulo p. 

Proof. Since My {to, to +p) modulo p only depends on to modulo p, we may assume 
that t Q = (p- l)/2. 
Let 

X = Dy{t , t Q +p+ l^Mvfo, to+p+1) 

be the reduction map from W-i t t +p+i to W-i t t . First we will show that pX is 
integral. It is easy to check that p 2 X is integral, by inspecting the powers of p 
dividing Dy{to, to+p + 1), but the integrality of pX requires more work. The proof 
is very similar to the proof of [KedOll Lemma 2]. Let w £ W-i^ + P +i, say 

u = F{x)y~ 2{to+p+1) dx/y, 

where F £ Z q [x] has degree at most 2g — 1. Let r) — Xuj, and write 

X] = G{x)y~ 2to dx/y 

where G £ Q q [x] has degree at most 2g — 1. We need to show that pn is integral. 

Since X is a reduction map, to and rj are cohomologous, and the discussion 
preceding Proposition shows that w — rj = dH where 

to+P+l 

h= J2 H t {x)y- 2t+1 

t=t +l 

for some polynomials H t £ Q q [x] of degree at most 2g. We may now use the same 
argument as in the proof of [KedOll Lemma 2] (namely, comparing the y-expansions 
of lo, r\ and dH around each root of Q{x)) to deduce that mr\ is integral, provided 
that m/ {2t — 1) is integral for to < t < to + p + 1. In particular prj is integral, since 
we have assumed that to = {p — l)/2. 
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Now we may finish the proof of the lemma. We have 

X = D v (t Q ,t Q +p+ l^Mvfo, t +p)M v {t + P +1). 

By Lemma Qj] we know that My (to + p + l) is invertible modulo p, so its inverse is 
integral. Rearranging, we obtain 

My (t , t + p) = D v (t , t + P + 1 )X My (t + P + 1 ) ~ 1 • 

Note that Dy(to, to + p+ 1) = IIt="t I +i — 1) is divisible by p 2 , since the first and 
last factors in the product are zero modulo p. The integrality of pX then implies 
that My (to, to + p) is zero modulo p. □ 

Now we may describe the vertical reduction phase. The input consists of the 
differentials Wi_j computed via the horizontal reductions. The output will be a 
collection of differentials Wi G W-\jq for < i < 2g that are cohomologous to Tj, 
and correct modulo p N . 

The first step is to compute the vertical reduction matrices 



'My (0,^1) j = 0, 

m v ( (2j -; )p - i , ' 23+ ;"'- 1 ) i<j<N, 

and similarly for Dj, using Theorem 1101 with entries in Ri. The invertibility hy- 
potheses of Theorem [TU1 are satisfied, because the total reduction length K satisfies 

(2(JV-l) + l)p-l (2JV-l)p 

i\ = < . 

2 2 

The latter is bounded by p 2 /6 from ([T]), so certainly \/K + 1 < p. 

For j > 1, observe that has valuation precisely 1, because in the product 

i((2j+l)p-l) 

d j= n (a-i), 

t=|((2i-i)p+i) 

the only term divisible by p is the first one, and |T]) implies that it is less than 
p 2 . Furthermore, Mj is zero modulo p by Lemma [T"2l Since Mj and Dj have been 
computed modulo p N+1 , we can therefore compute the (integral) matrix 

Xj = DJ 1 Mj 

correctly modulo For the j = case, the product for Do shows that it is a 
unit, so Xo = Dq^^Mq may be computed modulo p N as well. Note that Xj is the 
vertical reduction map from W_ x i((2j+i) p -i) to W_ 1 i^ 2 j~i)p-i) f° r i — 1j an d to 
W u, for j = (). 

Now we fix < i < 2g, and show how to compute We define a sequence of 
differentials 

UJV-I = Wi,N-l € IV-^i^JV-l^-l)! 

= Wi,N-2 + € W-l,i((2JV-3)p-l)! 



vq = w. l o + ^i^i e W / _ 1 i (p _ 1) . 
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Using Proposition 21 one checks by induction that 

for each 1 < m < N — 1, and that the coefficients of w m are correct modulo p N . 
Finally one puts Wi = XoVq € W-i t o, which by Proposition [4] is cohomologous to 
Ti, and again its coefficients are correct modulo p N . 

Remark. In the case N = 1, it is only necessary to compute Mq modulo p, rather 
than modulo p 2 as described above, since no divisions by p are involved at all. It 
is not clear to the author whether a similar optimisation is available when N > 1. 

7.4. Complexity analysis. 

Proof of Theorem fJl We first consider the time spent in the applications of The- 
orem I10[ which will be the dominant step when p is large compared to N and 
g. For both Rq — Z p n/(p N ) and R\ = Z p ™ / (p N+1 ), basic ring operations (ad- 
dition, multiplication) have bit-complexity 0(Nn log p), and the costs of poly- 
nomial and matrix arithmetic over Ri are given by M(e?) = 0(dNn\ogp) and 
MM(m) = 0(m"7Vn \ogp). For the horizontal reductions, for each of N rows, 
we applied Theorem [10] with K = 0(pN) and m — 0(g). Therefore each row costs 
0(j) 1 / 2 N 3 ^ 2 g u 'n). For the vertical reductions, we applied Theorem [TOl once, also 
with K = 0(pN) and m = 0(g). Therefore the total time is d(p 1/2 N 5 / 2 g UJ n). 

Now we estimate the time for the remaining steps, which for sufficiently large p 
will be negligible. 

Computing the coefficients Cj iT in Proposition [5] requires only 0(N 2 g 2 ) ring op- 
erations, even if naive polynomial multiplication is used. In the formulae for the 
Bj^ r , computing all the necessary binomial coefficients requires 0(N 2 ) ring opera- 
tions, and then computing all the Bj r requires 0(N 2 g) ring operations. Therefore 
computing the Bj r requires 0(N 2 g 2 ) ring operations altogether. 

Solving ((7| for each i requires 0(g 2 ) ring operations, even by the naive Euclidean 
extended GCD algorithm, so computing the coefficients of My(<) needs at most 
0(g 3 ) ring operations. Computing the coefficients of M^(s) for each of the N 
required values of t requires 0(Ng) ring operations. 

In the horizontal reduction phase, computing the inverse of the Vandermonde 
matrix requires 0(N 3 ) ring operations. Then for each of N rows we must perform 
the following steps. First, compute the values of ^^^(0)^/*'; costing 0(N 2 g 2 ) ring 
operations. Then use these values to compute M(k) = F[kp) for O(Ng) values of 
k; for each k this costs 0(Ng 2 ) ring operations, so over all k this costs 0(N 2 g 3 ). 
The total cost over all rows is 0(N 3 g 3 ) ring operations. 

Finally we must account for the 'single step' reductions during the horizontal 
reduction phase, as these were performed without the assistance of Theorem [TO] 
Each matrix-vector multiplication requires only O(g) ring operations, due to the 
sparsity of the matrices. For each of N rows, for each of O(Ng) values of m, and 
for each of 0(g) values of i, there are O(g) such steps, for a total cost of 0(N 2 g i ) 
ring operations. 

Altogether the cost is 0(N 3 g 4 ) ring operations, with corresponding bit-complexity 
0(iV 4 5 4 n log p). □ 
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8. Sample computations 

The author implemented the main algorithm in CH — h, only for the case n = 1, 
using Victor Shoup's NTL library ( |Sho07j . version 5.4) for the polynomial arith- 
metic. The implementation uses the middle product algorithm [HQZ04] for the key 
polynomial shifting step, as suggested in |BGS07| p. 1786]; this was made trivial 
thanks to Shoup's wonderfully modular FFT code. The matrix multiplication steps 
use the naive 0(n 3 ) algorithm. 

The source code is freely available under a GPL license from the author's web 
page, http://math.harvard.edu/~dmharvey/. The functionality has been made 
available in the SAGE computer algebra system (version 2.5.1 and later) [SJ05J. 
An example session: 

sage: R.<x> = PolynomialRing(ZZ) 

sage: from sage . schemes .hyperelliptic_curves . frobenius import frobenius 

sage: frobenius (p = 10007, N = 3, Q = x~5 + 2*x + 1) 
[844821791581 220205295882 761288372988 276316151941] 
[380371243619 656847071320 602083441024 781051879529] 
[435515877861 568305615656 204167847992 67069787872] 
[365277275232 293850471444 438804747301 298366229783] 

The following sample computations were performed on a 1.8 GHz AMD Opteron 
processor running Linux; many thanks to William Stein for offering this machine 
for the computations. The machine has 64GB of RAM and 16 cores, but only a 
single core was used. The compiler used was GCC 4.1.2 with optimisation flag -03, 
and NTL was linked with the GMP library (version 4.2.1, with Pierrick Gaudry's 
AMD assembly patch) for the underlying integer arithmetic. 

8.1. Dependence on p. Table [T] shows the time used to compute the Frobenius 
matrix over a range of p for the genus two curve y 2 — x 5 — 11a: 4 + 7a: 3 — 5x 2 + 
3x — 2, with precision N = 3. From Theorem [TJ one expects the running time to 
approximately double for every four-fold increase in p. 
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11.1 
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2 42 
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13.5 hours 



Table 1. Running times for g = 2 and N = 3 



8.2. Near-cryptographic sizes. For the purposes of constructing secure cryp- 
tosy stems, it is useful to be able to determine the zeta function of a hyperelliptic 
curve C of low genus over a large prime field [CFA+061 Ch. 23]. In particular 
one hopes to find a curve whose Jacobian order #J(C/F p ) is prime (or is a prime 
multiplied by a very small integer) and sufficiently large. 

For genus three and four, we ran our implementation on a single curve over the 
largest prime field that seemed feasible with the given hardware. We were able to 
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determine the zeta function for a curve whose Jacobian approaches a cryptograph- 
ically useful size, although there is still a gap to overcome. Handling a genus two 
curve with a large enough Jacobian is clearly out of reach of this technique. 

Thanks to Kiran Kedlaya for his assistance in using the MAGMA computer 
algebra system to perform some of the computations below. 

8.2.1. Genus three. We computed the characteristic polynomial of Frobenius mod- 
ulo p for the curve 

y 2 = x 7 + 17x 6 + 13x 5 + llx A + 7.x 3 + hx 2 + 3x + 2 

defined over F p where p = 2 50 — 27. The running time was 40 hours, and peak 
memory usage was 16 GB. 

This determines #J(C/F p ) modulo p, within an interval of width 0(p 3 ' 2 ). The 
search space is only 0(p 1 ^ 2 ), so MAGMA's baby-step/giant-step implementation is 
easily able to recover the Jacobian order. From this we inferred that the character- 
istic polynomial of Frobenius is 

{X 6 +p 3 ) + ai {X 5 + p 2 X) + a 2 (X 4 + pX 2 ) + a 3 X 3 , 

where 

oi = -8207566, 
a 2 = 336549388766991, 
a 3 = 17004180735172175425188. 
The order of the Jacobian over F p is 

1427247682301531613968301082755745957628851920 - 2 150 . 



8.2.2. Genus four. We computed the characteristic polynomial of Frobenius mod- 
ulo p 2 for the curve 

y 2 = x 9 - 23x 8 + 19x 7 - 17x e + 13a; 5 - 11.t 4 + 7x 3 - bx 2 + 3x - 2 

defined over F p where p = 2 44 + 7. The running time was 45 hours, and peak 
memory usage was 34 GB. 

This does not pin down the zeta function precisely, but it produces a short list of 
four candidates, which we checked in MAGMA by testing which proposed Jacobian 
order m satisfied raP = for a number of random points P defined over F p . We 
found that the characteristic polynomial of Frobenius is 

(A 8 +p 4 ) + ai (X 7 +p 3 X) + a 2 (X 6 + p 2 X 2 ) + a 3 (X 5 +pX 3 ) + a 4 X\ 

where 

a x = 2394254, 

a 2 = 29576915959850, 

a 3 = 88182558522652238508, 

a 4 = 536178748943545477971279916. 

The order of the Jacobian over F p is 

95780984339838343855809310281601230464609800042292722 - 2 176 . 
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